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Summary. The sequence of the gene encoding the membrane protein of human 
coronavirus 229 E (HCV 229 E) has been determined. The primary translation 
product, deduced from the DNA sequence, is a polypeptide of 225 amino acids 
with a predicted molecular weight of 26,000. The polypeptide has 3 potential 
N-glycosylation sites. Many structural similarities with the membrane proteins 
of other coronaviruses can be recognized. 


* 


The coronaviruses are a group of positive strand RNA viruses that cause a 
wide spectrum of disease in mammals and birds [21]. The human coronaviruses 
are though to cause 10-20% of all common colds, and about half of these are 
associated with the human coronavirus strain HCV 229E [4, 5]. The HCV 
229 E virion is comprised of a genomic RNA of approximately 6 x 10° molecular 
weight (mol.wt), a lipid envelope and three major proteins; a phosphorylated 
protein of 50,000 mol.wt. associated with the genome, a glycosylated 
180,000 mol.wt. protein and a family of polypeptides with estimated molecular 
weights of 25,000, 23,000, and 21,000 [7, 16]. It is clear that these proteins 
represent the nucleocapsid (N), surface (S) and membrane (M) proteins char- 
acteristic of coronaviruses [reviewed in 19]. 

The replication of HCV 229 E appears to follow the pattern which has now 
been well established for several coronaviruses, notably, avian infectious bron- 
chitis virus (IBV) and murine hepatitis virus (MHV) [reviewed in 19]. In the 
HCV 229 E infected cell a set of 6 3’-coterminal subgenomic mRNAs are syn- 
thesized, the smallest of which, mRNA 7, encodes the nucleocapsid protein (22, 
17]. As for other coronaviruses, the synthesis of the HCV 229E subgenomic 
mRNAs appears to involve a leader-primed mechanism of discontinuous tran- 
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scription in which a specific intergenic sequence (TCTAAAC for MHV and 
HCV) plays an important role [17]. In this paper we report the nucleotide 
sequence of the genomic region of HCV 229E which encodes the membrane 
protein. This region is adjacent to the 3’ terminal nucleocapsid protein gene 
and corresponds to the unique region of mRNA 6. 

HCV 229E virus was obtained from Dr. D. A. J. Tyrrell of the MRC 
Common Cold Unit, Salisbury, U.K., plaque purified and propagated in C16 
cells at 33 °C as described previously [11]. The cytoplasmic RNA from 108 cells 
which had been infected 48h previously at an m.o.i. of 3 was extracted by 
standard procedures [18] and the poly A-containing fraction selected by hy- 
bridization to poly U-Sepharose. cDNA synthesis was performed according to 
the procedure of Gubler and Hoffman [3] using 5ug of RNA primed with 
random hexanucleotides. The double-stranded cDNA was ligated to EcoRI 
linkers and cloned into the EcoRI site of the pBluescript vector, pKS* (Stra- 
tagene, Federal Republic of Germany). After transformation, a library of re- 
combinant clones was established and screened by colony hybridization with 
a synthetic oligonucleotide, 5; TTGAACATTCCAATAGCC 3’, which is com- 
plementary to a region 165-183 bases from the 5’ end of the HCV 229E 
nucleocapsid gene [Myint et al., in prep.; 17]. This search identified the clone 
2F7 which hybridized to all 7 virus specific mRNA species in HCV 229E 
infected cells (data not shown). The cDNA insert of the clone 2F7 was se- 
quenced completely on both strands using restriction endonuclease fragments 
subcloned into the M 13 vector, mp 18 [9] and the dideoxyribonucleotide chain 
termination method of Sanger [15]. Universal and sequence specific primers 
synthesized on a Cyclone DNA synthesizer (Milli Gen, Federal Republic of 
Germany) were used. The sequence data were assembled and analysed using 
the programs of Staden [20] and the University of Wisconsin Genetics Group 
[2]. 

The sequence of the clone 2F7 corresponding to the genomic region rep- 
resenting the unique region of mRNA 6 is shown in Fig. 1. This region contains 
a single large open reading frame (ORF) of 678 nucleotides. The open reading 
frame is flanked on either side by the nucleotide sequence TCTAAAC (Fig. 1, 
nucleotide positions 33-39 and 722-728) which would be at or near the sites 
of fusion between the leader RNA and the mRNA6 and mRNA7 coding 
regions, respectively [17]. Fourteen nucleotides downstream from the large 
ORF is an AUG codon which represents the initiation codon of the nucleocapsid 
gene [Myint et al., in prep.] and 12 nucleotides upstream from the large ORF 
is a TAA codon which represents the termination codon of the 5’ proximal 
gene [Raabe et al., in prep. ]. 

The large open reading frame predicts a polypeptide of 225 amino acids 
with a molecular weight of 26,000. The predicted polypeptide has several features 
which are characteristic of a coronavirus membrane protein. Firstly, there are 
three potential N-linked glycosylation sites (Fig. 1, amino acid positions 5, 190, 
and 214), one of which is near the amino-terminus. Secondly, the polypeptide 
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1 CATAGACCCTTTCCCTAAACGAGTTATTGATTTCILAARCTAAACGACAATGTCAAATGAC 60 
M S N D 4 


61 AATTGTACGGGTGACATTGTCACCCATTTGAAGAATTGGAATTTTGGTTGGAATGTTATT 120 
5 N ¢C T G DI V T H LK N WN F G WN V TE 24 
e 


iZi CTAACCATATTCATTGTTATTCTTCAGTTTGGACACTATAAATACTCCAGATTGTTTTAT 180 
25. de To OP od, Mo SE ie QO UB 2G BOX KX SR Se POY 44 


181 GGTTTGAAGATGCTTGTACTGTGGCTTCTTTGGCCACTCGTACTIGCTTTGTCAATCTIT 240 
45 G L K M LV Lb Wook WwW Po Vv Gb AL § I PF 64 


241 GACACCTGGGCTAATTGGGATTCTAATTGGGCCTTTGTIGCATTTAGCTTTTTTATGGCC 300 
65 D T WAN WD S N WA F V A F S F F M A 84 


301 GTATCAACACTCGTTATGTGGGTGATGTACTTCGCAAACAGTTTCAGACTTTTCCGACGT 360 
85 Vv Ss T L VM WV M Y F A N S F R L F R R 104 


361 GCTCGAACTTTTTGGGCATGGAATCCTGAGGTTAATGCAATCACTGTCACAACCGTGTTG 420 
105 A R T F W AW N P E V N AIT VT T VOD 124 


421 GGACAGACATACTATCAACCCATTCAACAAGCTCCAACAGGCATTACTGTGACCTTGCTG 480 
125 G QT ¥Y ¥Y Q P I QQ A PT GIT VT OLE 144 


481 AGCGGCGTGCTTTACGTTGACGGACATAGATTGGCTTCAGGTGTTCAGGTTCATAACCTA 540 
145 S$ GV LY V DG H ROLLA S&S GV QV HN L 164 


$41 CCTGAATACATGACAGTTGCCGTGCCGAGCACTACTATAATTTATAGTAGAGTCGGAAGG 600 
165 P E ¥Y M T V AV PS  T ITTY S RV GR 184 


601 TCCGTAAATTCACAAAATTGCACAGGCTGGGTTTTCTACGTACGAGTAAAACACGGTGAT 660 


185 S Vv N &§ Q N C T GW VF ¥Y V R V K HG OD 204 
e 
661 TUTPCTGCAGTGAGCTCTCCCATGAGCAACATGACAGAAAACGAAAGATTGCTTCATTTT 720 
205 F S AV S&S S$ P M S N M T E N BE R LL H F 224 
e 


721 TTCTAAACTGAACGAAAAGATGGCTAC 747 
225 F #* 


Fig. 1. Nucleotide sequence of the HCV 229 E membrane protein gene. The numbering of 

the nucleotide sequence is arbitrary. The predicted amino acid sequence of the membrane 

protein is shown in the single letter code and the position of three potential N-glycosylation 

sites are marked (@). The intergenic sequence TCTAAAC is overlined. The positions of 

the nucleocapsid gene initiation codon and the 5S’ upstream ORF termination signal are 
boxed 


displays three internal hydrophobic domains (Fig. 1, amino acid positions 17— 
37, 48-63, and 75—95) within the amino terminal half and a relatively hydrophilic 
carboxy-terminus (Fig.1, amino acid 213-221). Thirdly, the polypeptide is 
slightly basic with a net charge of + 4 at neutral pH. 

A comparison of the amino acid sequences of the M proteins of HCV 229 E, 
TGEV, MHV, BCV, and IBV (Fig. 2) confirms that the HCV protein has a 
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Fig. 2. Sequence similarity of the HCV 229 E, TGEV, MHV, BCV, and IBV M proteins. 
The sequences were aligned and percentage similarities determined using the program GAP 
of the UWGCG sequence analysis software. The positions of the hydrophobic transmem- 
brane domains 1, 2, and 3 are overlined. Positions with identical amino acids are indicated 
(IB) as well as those which are designated as similar by the UWGCG program SIMPLIFY 
(CJ). The numbering of the amino acids is arbitrary. The M protein sequences are taken 


high sequence similarity to the other coronavirus proteins (HCV/TGEV 68%, 
HCV/MHV 59%, HCV/BCV 57%, and HCV/IBV 52%). Also, from this com- 
parison it is evident that the HCV 229EM protein, in contrast to the TGEV 
protein, does not possess a putative N-terminal signal sequence and, as has 
been noted previously [6], all coronavirus M proteins, including that of HCV 
229 E, display a structurally similar central domain (Fig. 2, amino acid positions 


127-152). 
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On the basis of structural and biochemical data Rottier and coworkers [12— 
14] have proposed a model for the membrane topology of the MHV M protein. 
In this model a short glycosylated region of the amino terminus is on the outside 
of the virion. The protein then enters and traverses the virion membrane three 
times (corresponding to the hydrophobic regions 1, 2, and 3) before emerging 
from the cytoplasmic face of the lipid bilayer. Basic domains in the carboxy 
terminal region of the protein are then proposed to interact with the nucleocapsid 
structure during virus maturation. The data presented here are fully consistent 
with this model. 
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